Key-concept extraction from French articles with KX
نویسندگان
چکیده
We present an adaptation for the French text mining challenge (DEFT 2012) of the KX system for multilingual unsupervised key-concept extraction. KX carries out the selection of a list of weighted keywords from a document by combining basic linguistic annotations with simple statistical measures. In order to adapt it to the French language, a French morphological analyzer (PoS-Tagger) has been added into the extraction pipeline, to derive lexical patterns. Moreover, parameters such as frequency thresholds for collocation extraction and indicators for key-concepts relevance have been calculated and set on the training documents. In the DEFT 2012 tasks, KX achieved good results (i.e. 0.27 F1 for Task 1 with terminological list, and 0.19 F1 for Task 2) with a limited additional effort for domain and language adaptation. MOTS-CLÉS : Extraction de mots-clés, patrons linguistiques, terminologie.
منابع مشابه
KX: A Flexible System for Keyphrase eXtraction
In this paper we present KX, a system for keyphrase extraction developed at FBK-IRST, which exploits basic linguistic annotation combined with simple statistical measures to select a list of weighted keywords from a document. The system is flexible in that it offers to the user the possibility of setting parameters such as frequency thresholds for collocation extraction and indicators for keyph...
متن کاملPrivate Puncturable PRFs from Standard Lattice Assumptions
A puncturable pseudorandom function (PRF) has a master key k that enables one to evaluate the PRF at all points of the domain, and has a punctured key kx that enables one to evaluate the PRF at all points but one. The punctured key kx reveals no information about the value of the PRF at the punctured point x. Punctured PRFs play an important role in cryptography, especially in applications of i...
متن کاملFrench Resources for Extraction and Normalization of Temporal Expressions with HeidelTime
In this paper, we describe the development of French resources for the extraction and normalization of temporal expressions with HeidelTime, a open-source multilingual, cross-domain temporal tagger. HeidelTime extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. Several types of temporal expressions are extracted: dates, times, durations ...
متن کاملData extraction from machine-translated versus original language randomized trial reports: a comparative study
BACKGROUND Google Translate offers free Web-based translation, but it is unknown whether its translation accuracy is sufficient to use in systematic reviews to mitigate concerns about language bias. METHODS We compared data extraction from non-English language studies with extraction from translations by Google Translate of 10 studies in each of five languages (Chinese, French, German, Japane...
متن کاملKnowledge discovery in bibliographic collections using concept hierarchies and visualization tools
This paper presents new methods for knowledge extraction and visualization, applied to datasets selected from the astronomical literature. One of the objectives is to detect correlations between concepts extracted from the documents. Concepts are generally meta-information which may be defined a priori, or may be extracted from the document contents and are organised along domain ontologies or ...
متن کامل